Reducing the OOV rate in broadcast news speech recognition
نویسندگان
چکیده
Thomas Kemp Alex Waibel Interactive Systems Laboratories, ILKD University of Karlsruhe 76128 Karlsruhe, Germany ABSTRACT The recognition of broadcast news is a challenging problem in speech recognition. To achieve the long-term goal of robust, real-time news transcription, several problems have to be overcome, e.g. the variety of acoustic conditions and the unlimited vocabulary. In this paper we address the problem of unlimited vocabulary. We show, that this problem is more serious for German than it is for English. Using a speech recognition system with a large vocabulary, we dynamically adapt the active vocabulary to the topic of the current news segment. This is done by using information retrieval (IR) techniques on a large collection of texts automatically gathered from the internet. The same technique is also used to adapt the language model of the recognition system. The process of vocabulary adaptation and language model retraining is completely unsupervised. We show, that dynamic vocabulary adaptation can significantly reduce the out-of-vocabulary (OOV) rate and improve the word error rate of our broadcast news transcription system View4You. 1. THE VIEW4YOU SYSTEM The View4You project is a cooperation between the Interactive Systems Labs and the Carnegie Mellon University's Informedia group [5]. It aims at the automatic generation of a searchable multilingual video database. In the prototype system, German and Serbocroatian TV news shows are recorded daily and stored as MPEG compressed les. Using the acoustic signal, a segmenter chops the newscasts into acoustically homogeneous segments ranging from several seconds to few minutes in length. A speech recognition system generates transcriptions for the segments. The segmentation information and the automatic transcriptions are stored in a database. The user of the system can give queries in natural language, e.g. 'Tell me everything about the peace talks between Mr Netanyahu and Mr Arafat'. Using the speech recognizer's transcriptions in the multimedia database, an information retrieval component computes a ranked order of relevant segments, which are displayed to the user. By clicking on a segment, an MPEG-player is activated that plays the corresponding video segment. For more details on the View4You system, see [1]. 2. MOTIVATION The index into View4You's video database consists of the output of our speech recognizer. Therefore, only words that are in the vocabulary of the recognizer can be searched for. If a video contains keywords that are unknown to the recognizer, they cannot be found in the index, and the user can not retrieve the video by this keyword. OOV (out-of-vocabulary) words therefore pose a problem to the View4You system, and the vocabulary of the speech recognizer should be as large as possible to ensure low OOV rates. Currently, our speech recognizer is limited to a vocabulary of 64k words. On the North American Business News (NAB) corpus, a vocabulary of this size covers more than 99% of the text, and even with a 20k vocabulary, OOV rates on the NAB task do not exceed 3%. We measured the OOV rate on German news shows for a 60k vocabulary which was derived from our language model corpus. The result is shown in gure 1.
منابع مشابه
Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کاملBroadcast news LM adaptation using contemporary texts
This paper investigates the problem of dynamically updating the language model (LM) of a broadcast news speech recognition system, in order to cope with language and topic changes, typical of the news domain. Statistical adaptation methods are proposed that exploit written news sources which are daily available on the Internet, i.e. newswires and newspapers. Specifically, LM adaptation is perfo...
متن کاملHow Diachronic Text Corpora Affect Context based Retrieval of OOV Proper Names for Audio News
Out-Of-Vocabulary (OOV) words missed by Large Vocabulary Continuous Speech Recognition (LVCSR) systems can be recovered with the help of topic and semantic context of the OOV words captured from a diachronic text corpus. In this paper we investigate how the choice of documents for the diachronic text corpora affects the retrieval of OOV Proper Names (PNs) relevant to an audio document. We first...
متن کاملTranscribing Multilingual Broadcast News Using Hypothesis Driven Lexical Adaptation
This paper describes first results of our DARPA-sponsored efforts toward recognizing and browsing foreign language, more specifically, Serbo-Croatian broadcast news. For Serbo-Croatian as well as many other than the most common well studied languages, the problems of broadcast quality recognition are complicated by 1.) the lack of available acoustic and language data, and 2.) the excessive voca...
متن کاملMulti-pass ASR using vocabulary expansion
Current automatic speech recognition (ASR) systems have to limit their vocabulary size depending on available memory size, expected processing time, and available text data for building a vocabulary and a language model. Although the vocabularies of ASR systems are designed to achieve high coverage for the expected input data, it cannot be avoided that input data includes out-of-vocabulary (OOV...
متن کامل